It is not simple to decide how to address the multiple issues that can arise after the first run through the assumptions of the Rasch model. There is also often not just one way to obtain a scale that fits. Usually, obtaining a good fitting scale by preserving as much as possible its original features should be prioritized.
Typically, articles report a) the fit of a scale at start analysis with focus on the results when testing each of the Rasch assumptions. Next, they report b) the fit at the end of the calibration process, once all breaches to the assumptions have been addressed. How to get from a) to b) the most efficiently is not written down anywhere, however some general advises are available.
One general recommendation is to start with the assumption of local item dependency (LID). Strong LID can go along with multidimensionality and also cause item thresholds to not work properly. Depending of the extent of LID, the aggregation into testlets can solve the multidimensionality. Once the questionnaire is free of LID and is found unidimensional, the interpretation of the reliability starts to make sense. Also, at this point the item fit statistics can be further investigated to make sure that all items or testlets work well to measure the construct.Finally, the analysis of DIF is undertaken. Depending on the purpose of the questionnaire it is worthwhile to gather information on how the items works for different person subgroups or different assessment situations and make sure that DIF is not indicating any unfair treatment of some subgroups.
The complexity of the analysis and the number of models undertaken can challenge the clarity of the reporting of a Rasch analysis.
Schematizations can be helpful.
Example PTGI (Kunz 2019) - Conceptualization of the analysis approach
Also, instead of each little adjustment step and test, only the fit statics at start and in the final Rasch model can be shown in an analysis summary table.
Example WHODAS 2.0 (Chiu 2019) - Summarizing
During the course following issues where found for the SRG-scale.
In the exercise of seminar 7, LID-continued, creating a testlet for SRG15 and SRG13 resulted in bad fit. Exceptionally, I would suggest to remove SRG15 from the scale. Usually, one would be cautious with the deletion of items, especially when testing the metric properties of a scale that is already established in the practice and research. When developing a new scale, when items are selected for a scale, the deletion of misfitting items is not as problematic.
Also, with regard to DIF, let’s assume that we want to come up with one metric for the entire SCI sample, and that the systematic differences for gender in item SRG8 is not understood as a “favoritism” for one of the subgroup. In summary, also for keeping the example more simple, let’s not split the item and keep just one difficulty estimate for SRG8.
First load the data but remove item SRG15:
urlfile = "https://raw.githubusercontent.com/CarolinaFellinghauer/UNIZH_HS2021_Rasch/main/Data/SRG_Data_Course_UZH_HS2021.csv"
srg.data=read.csv(url(urlfile))
dim(srg.data)
## [1] 450 26
colnames(srg.data)
## [1] "X" "ID" "Age"
## [4] "Gender" "Completeness" "para.tetra_1"
## [7] "traumatic_nontraumatic" "PersStat" "SRG1"
## [10] "SRG2" "SRG3" "SRG4"
## [13] "SRG5" "SRG6" "SRG7"
## [16] "SRG8" "SRG9" "SRG10"
## [19] "SRG11" "SRG12" "SRG13"
## [22] "SRG14" "SRG15" "TP"
## [25] "ID_Unik" "wgt"
srg.items=c("SRG1", "SRG2", "SRG3", "SRG4", "SRG5", "SRG6", "SRG7", "SRG8", "SRG9", "SRG10", "SRG11", "SRG12", "SRG13", "SRG14") # minus "SRG15"
# dataset with SRG items and the person factors
data.srg = srg.data[,srg.items]
Now, the analysis will be run again without the item SRG15
library(eRm)
library(iarm)
PCM.srg.2 = PCM(data.srg[,srg.items], sum0 = TRUE)
#Targeting and reliability
scale.properties = test_prop(PCM.srg.2)
Separation Reliability Test difficulty Test target
0.8975534 0.0350000 0.0970000
Test information
5.7793920
scale.properties[c(1,3)]
Separation Reliability Test target
0.8975534 0.0970000
#Person-Item Map
plotPImap(PCM.srg.2, sort = TRUE, main = "SRG-metric")
PP.srg.2 = person.parameter(PCM.srg.2)
resid.srg.2 = residuals(PP.srg.2)
#Item Fit
eRm::itemfit(PP.srg.2)
Itemfit Statistics:
Chisq df p-value Outfit MSQ Infit MSQ Outfit t Infit t Discrim
SRG1 403.527 430 0.816 0.936 0.975 -0.930 -0.382 0.554
SRG2 481.746 430 0.043 1.118 1.115 0.996 1.537 0.481
SRG3 479.234 429 0.047 1.114 1.079 1.593 1.302 0.543
SRG4 383.938 430 0.946 0.891 0.926 -1.538 -1.243 0.618
SRG5 420.058 432 0.651 0.970 1.017 -0.340 0.306 0.586
SRG6 401.453 432 0.851 0.927 0.917 -0.915 -1.369 0.628
SRG7 344.610 429 0.999 0.801 0.829 -3.075 -2.973 0.664
SRG8 396.960 432 0.886 0.917 0.938 -0.925 -0.957 0.606
SRG9 397.358 432 0.883 0.918 0.923 -1.236 -1.297 0.602
SRG10 378.226 430 0.966 0.878 0.878 -1.933 -2.040 0.598
SRG11 384.962 432 0.949 0.889 0.897 -1.418 -1.726 0.647
SRG12 401.636 427 0.806 0.938 0.961 -0.538 -0.543 0.583
SRG13 514.941 432 0.004 1.189 1.110 2.499 1.708 0.384
SRG14 362.936 432 0.993 0.838 0.855 -2.446 -2.522 0.655
#LID - local item dependencies
cor.resid.srg.2 = cor(resid.srg.2, use = "pairwise.complete.obs")
cor.resid.srg.2.tri = cor.resid.srg.2
cor.resid.srg.2.tri[upper.tri(cor.resid.srg.2.tri, diag = TRUE)] = NA
which(cor.resid.srg.2.tri > 0.2, arr.ind = TRUE)
row col
#PCA eigenvalues
eigen(cor.resid.srg.2)$values
[1] 1.82693212 1.67990879 1.47913292 1.23935187 1.17468146 1.05375863
[7] 0.96486843 0.89601162 0.86659402 0.84033068 0.72041348 0.67329405
[13] 0.57098978 0.01373214
#thresholds
thres_map_fct = "https://raw.githubusercontent.com/CarolinaFellinghauer/UNIZH_HS2020_Rasch/master/RFunctions/threshold_map_fct.r"
source(url(thres_map_fct))
ThresholdMap(thresholds(PCM.srg.2))
Deleting of SRG15 resulted in :
In principle, once the scale has been calibrated with the Rasch model, a transformation table is created which links the rows scores of the final scale to the related-logit scored ability estimates and the logit scores to user-friendly rescaled scores. The range of the user-friendly score is typically from 0 to 100, which would allow to express scores in percentage of the maximum score that can be obtained. A transformed range from 0 to 100 makes only sense if the original range is already large. Like one would not rescale from 0 to 100 if the original instrument score range is very small. Otherwise, another convenient score range is selected.
library(scales)
names(PP.srg.2)
[1] "X" "X01" "X.ex" "W" "model"
[6] "loglik" "loglik.cml" "npar" "iter" "betapar"
[11] "thetapar" "se.theta" "theta.table" "pred.list" "hessian"
[16] "mpoints" "pers.ex" "gmemb"
T.Table = as.data.frame(cbind(PP.srg.2$pred.list[[1]]$x, PP.srg.2$pred.list[[1]]$y))
colnames(T.Table) = c("Row Score", "Logit Score")
#create a rescaled Rasch-Score in a convenient range, here from 0 to 100
Transformed_Score = scales::rescale(T.Table[,2], to = c(0, 100))
T.Table = cbind(T.Table, Transformed_Score)
colnames(T.Table) = c("Row Scores", "Logit Scores", "0-100 Scores")
#round to the second decimals of the two last columns
T.Table[,c(2,3)] = round(T.Table[, c(2,3)], 2)
T.Table
Row Scores Logit Scores 0-100 Scores
1 0 -4.47 0.00
2 1 -3.57 9.66
3 2 -2.74 18.53
4 3 -2.22 24.12
5 4 -1.83 28.32
6 5 -1.51 31.75
7 6 -1.23 34.68
8 7 -0.99 37.29
9 8 -0.77 39.67
10 9 -0.56 41.87
11 10 -0.37 43.95
12 11 -0.18 45.94
13 12 0.00 47.86
14 13 0.18 49.75
15 14 0.35 51.60
16 15 0.52 53.46
17 16 0.70 55.32
18 17 0.87 57.21
19 18 1.05 59.15
20 19 1.24 61.17
21 20 1.44 63.29
22 21 1.65 65.55
23 22 1.88 68.01
24 23 2.14 70.76
25 24 2.43 73.93
26 25 2.79 77.78
27 26 3.27 82.88
28 27 4.04 91.07
29 28 4.87 100.00
An interesting application for Rasch (and IRT) derived parameter is found in computer adaptive testing. Computer-based testing is a broad field which includes linear and adaptive testing.
In linear testing, the same number of test questions are administered in a same order to all respondents. Linear testing is similar to a standard paper-based test. Additionally, to paper-pencil, the computer can immediately process the responses and compute the respondents’ score.
Adaptive testing is a type of testing where the scale adjusts to the ability of the respondent. The questions that a respondent receives are selected based on the past responses. In that sense, the test adapts to the response pattern and the ability of the respondent. The goal of the CAT is to select items that reduce the standard error of measurement and help to obtain a stable estimate of the ability. Typically, when the ability estimate varies within a small margin of error, the test can be stopped. Computer-adaptive testing offers several advantages such as shortening the time for test delivery and immediate score reporting to candidates.
Boston University: School of Public Health
In R, the package mirtCAT from the same authors as mirt allows adaptive testing, using the item parameter issued from mirt or any other IRT-software (manual entry of the difficulty parameter). The package mirtCAT can also be used for multidimensional testing. While other packages for CAT would be available in R, mirtCAT allows to generate a user-friendly interface to administer a CAT.
Building an interface for a CAT requires:
start_item =).criteria =)method =)design = list())The values that these settings can take can be found when typing ?mirtCAT.